surface form
Syntactic Blind Spots: How Misalignment Leads to LLMs Mathematical Errors
Williamson, Dane, Ji, Yangfeng, Dwyer, Matthew
Large Language Models (LLMs) demonstrate strong mathematical problem-solving abilities but frequently fail on problems that deviate syntactically from their training distribution. We identify a systematic failure mode, syntactic blind spots, in which models misapply familiar reasoning strategies to problems that are semantically straightforward but phrased in unfamiliar ways. These errors are not due to gaps in mathematical competence, but rather reflect a brittle coupling between surface form and internal representation. To test this, we rephrase incorrectly answered questions using syntactic templates drawn from correct examples. These rephrasings, which preserve semantics while reducing structural complexity, often lead to correct answers. We quantify syntactic complexity using a metric based on Dependency Locality Theory (DLT), and show that higher DLT scores are associated with increased failure rates across multiple datasets. Our findings suggest that many reasoning errors stem from structural misalignment rather than conceptual difficulty, and that syntax-aware interventions can reveal and mitigate these inductive failures.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Retrieval-Constrained Decoding Reveals Underestimated Parametric Knowledge in Language Models
Hamdani, Rajaa El, Haffoudhi, Samy, Holzenberger, Nils, Suchanek, Fabian, Bonald, Thomas, Malliaros, Fragkiskos D.
Language models (LMs) encode substantial factual knowledge, but often produce answers judged as incorrect. We hypothesize that many of these answers are actually correct, but are expressed in alternative surface forms that are dismissed due to an overly strict evaluation, leading to an underestimation of models' parametric knowledge. We propose Retrieval-Constrained Decoding (RCD), a decoding strategy that restricts model outputs to unique surface forms. We introduce YAGO-QA, a dataset of 19,137 general knowledge questions. Evaluating open-source LMs from 135M to 70B parameters, we show that standard decoding undervalues their knowledge. For instance, Llama-3.1-70B scores only 32.3% F1 with vanilla decoding but 46.0% with RCD. Similarly, Llama-3.1-8B reaches 33.0% with RCD, outperforming the larger model under vanilla decoding. We publicly share the code and dataset at https://github.com/Rajjaa/disambiguated-LLM.
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > China > Hong Kong (0.04)
- (17 more...)
- Media (1.00)
- Aerospace & Defense (0.93)
- Leisure & Entertainment > Sports > Soccer (0.69)
Annotating and Inferring Compositional Structures in Numeral Systems Across Languages
Rubehn, Arne, Rzymski, Christoph, Ciucci, Luca, van Dam, Kellen Parker, Kučerová, Alžběta, Bocklage, Katja, Snee, David, Stephen, Abishek, List, Johann-Mattis
Numeral systems across the world's languages vary in fascinating ways, both regarding their synchronic structure and the diachronic processes that determined how they evolved in their current shape. For a proper comparison of numeral systems across different languages, however, it is important to code them in a standardized form that allows for the comparison of basic properties. Here, we present a simple but effective coding scheme for numeral annotation, along with a workflow that helps to code numeral systems in a computer-assisted manner, providing sample data for numerals from 1 to 40 in 25 typologically diverse languages. We perform a thorough analysis of the sample, focusing on the systematic comparison between the underlying and the surface morphological structure. We further experiment with automated models for morpheme segmentation, where we find allomorphy as the major reason for segmentation errors. Finally, we show that subword tokenization algorithms are not viable for discovering morphemes in low-resource scenarios.
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
Yao, Yihang, Cen, Zhepeng, Li, Miao, Han, William, Zhang, Yuyou, Liu, Emerson, Liu, Zuxin, Gan, Chuang, Zhao, Ding
Large Language Models (LLMs) have demonstrated strong reasoning capabilities across various tasks. However, even minor variations in query phrasing, despite preserving the underlying semantic meaning, can significantly affect their performance. To address this, we focus on enhancing LLMs' awareness of symmetry in query variations and propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage through query augmentations, enabling more data-efficient training and stronger generalization to Out-of-Distribution (OOD) settings. Extensive experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations, providing new insight into improving LLM robustness through structured dataset curation.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts
Zhang, Yu, Peng, Shujun, Wu, Nengwu, Lin, Xinhan, Hu, Yang, Tang, Jie
Recently, substantial advancements have been made in training language models to carry out step-by-step reasoning for solving intricate numerical reasoning tasks. Beyond the methods used to solve these problems, the structure and formulation of the problems themselves also play a crucial role in determining the performance of large language models. We observe that even small changes in the surface form of mathematical problems can have a profound impact on both the answer distribution and solve rate. This highlights the vulnerability of LLMs to surface-level variations, revealing its limited robustness when reasoning through complex problems. In this paper, we propose RM-PoT, a three-stage framework that integrates problem reformulation (RM), code-aided reasoning (PoT), and domain-aware few-shot learning to address these limitations. Our approach first reformulates the input problem into diverse surface forms to reduce structural bias, then retrieves five semantically aligned examples from a pre-constructed domain-specific question bank to provide contextual guidance, and finally generates executable Python code for precise computation. Mathematical reasoning is a cornerstone of problem-solving, with applications spanning diverse fields such as physics, engineering, economics, and computer science.
Query Brand Entity Linking in E-Commerce Search
Western brand name written in its original form versus its representation in Asian scripts), (ii) different surface forms for the same In this work, we address the brand entity linking problem for e-brand (e.g., abbreviations versus full names) and (iii) identifying commerce search queries. The entity linking task is done by either i) brand relationships between parent and sub-brands (e.g., a parent a two-stage process consisting of entity mention detection followed company and its product line brands). Therefore, in addition to by entity disambiguation or ii) an end-to-end linking approaches recognizing the brand names mentioned in the query, it is also that directly fetch the target entity given the input text. The task important to link them to the corresponding global brand entity. It presents unique challenges: queries are extremely short (averaging would be valuable to unify the concept of brand across different e-2.4 words), lack natural language structure, and must handle a commercial stores in a single namespace, i.e., brand entity (identity massive space of unique brands. We present a two-stage approach to each brand itself). Each brand entity is is unique across languages, combining named-entity recognition with matching, and a novel stores and surface forms. As part of this effort, we aim to recognize end-to-end solution using extreme multi-class classification.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (5 more...)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
MELO: An Evaluation Benchmark for Multilingual Entity Linking of Occupations
Retyk, Federico, Gasco, Luis, Carrino, Casimiro Pio, Deniz, Daniel, Zbib, Rabih
We present the Multilingual Entity Linking of Occupations (MELO) Benchmark, a new collection of 48 datasets for evaluating the linking of entity mentions in 21 languages to the ESCO Occupations multilingual taxonomy. MELO was built using high-quality, pre-existent human annotations. We conduct experiments with simple lexical models and general-purpose sentence encoders, evaluated as bi-encoders in a zero-shot setup, to establish baselines for future research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Spain (0.04)
- (26 more...)
Numbers Matter! Bringing Quantity-awareness to Retrieval Systems
Almasian, Satya, Bruseva, Milena, Gertz, Michael
Quantitative information plays a crucial role in understanding and interpreting the content of documents. Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $10k''. Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information. In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently. These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than. To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models. The code and data are available under https://github.com/satya77/QuantityAwareRankers.
- Oceania > Australia (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Germany (0.04)
- Asia > China > Hong Kong (0.04)
Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction
Manrique-Gómez, Laura, Montes, Tony, Manrique, Rubén
Another substantial as key historical resources, contain a diverse project is the "Digging into Data Challenge". A range of information about political, economic, part of the Transatlantic Partnership for Social Sciences and cultural processes and are abundant due to and Humanities 2016, this initiative yielded focused efforts to preserve them within national a vast collection of 19th-century press materials archives. Indeed, the discipline of Digital Humanities, known as "Atlas - Oceanic Exchanges. Tracing which emphasizes the incorporation of digital Global Information Networks in Historical Papers" tools in humanities and social sciences research, (Exchanges). Other significant works include "Viral has spent much of the past three decades on the Texts: Mapping Networks of Reprinting in 19th-task of digitization, resulting in a wealth of curated Century Newspapers and Magazines" (Cordell and digital collections (Berry and Fagerjord, 2017; Dobson, Smith), a project that investigates 19th-century journalistic 2019). However, digitizing these corpora has reports to understand the culture of reprinting brought plenty of challenges in transcribing the in the United States before the Civil War, and images into machine-readable texts.
- North America > Panama (0.05)
- South America > Venezuela (0.05)
- South America > Colombia > Bogotá D.C. > Bogotá (0.05)
- (7 more...)
Efficient Biomedical Entity Linking: Clinical Text Standardization with Low-Resource Techniques
Achara, Akshit, Sasidharan, Sanand, N, Gagan
Clinical text is rich in information, with mentions of treatment, medication and anatomy among many other clinical terms. Multiple terms can refer to the same core concepts which can be referred as a clinical entity. Ontologies like the Unified Medical Language System (UMLS) are developed and maintained to store millions of clinical entities including the definitions, relations and other corresponding information. These ontologies are used for standardization of clinical text by normalizing varying surface forms of a clinical term through Biomedical entity linking. With the introduction of transformer-based language models, there has been significant progress in Biomedical entity linking. In this work, we focus on learning through synonym pairs associated with the entities. As compared to the existing approaches, our approach significantly reduces the training data and resource consumption. Moreover, we propose a suite of context-based and context-less reranking techniques for performing the entity disambiguation. Overall, we achieve similar performance to the state-of-the-art zero-shot and distant supervised entity linking techniques on the Medmentions dataset, the largest annotated dataset on UMLS, without any domain-based training. Finally, we show that retrieval performance alone might not be sufficient as an evaluation metric and introduce an article level quantitative and qualitative analysis to reveal further insights on the performance of entity linking methods.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)